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PARALLEL INDEX MAINTENANCE base may assign any work granule to a process on any node. 

regardless of the location of the disk that contains the data 

HELD OF THE INVENTION that will be accessed during the work granule. 

The present invention relates to database management Shared-disk databases may be run on both shared-nothing 

systems and more particularly to updating indexes in ^ sharedndisk computer systems. To run a ^ared-disk 

response to paraUel execution of data manipulation opera- database on a shared-nothing computer system, as shown m 

^Qjjg FIG. 1, software support may be added to the operating 

system or additional hardware may be provided to allow 

BACKGROUND OF THE INVENTION processes to have direct access to remote disks. For 

10 example, software support may allow slave W 131 on node 

To fully utilize the computing power of a multi- ui to have direct access to disk bank 153 on node 113. 

processmg system, a larger task (a "parent task**) may be . , , , , * * i i j- i 

J- -J J • * n * I /M„ 1 I >n u- u .u In general, however, a node s access to its local disks may 

divided into smaller tasks ( ^»^rk granules 7 which are then . ^ <i • » *u . * ^ t r? 

J. , T_ ^ , ^ r« 1 • he more efficient than its access to remote disks. For 

distributed to processes ( slaves 0 ninning on one or more , ^ ^<^> « ^- i u i • 

**u- - J J example, node Ills access to disk bank 151 is more 

processmg nodes. The slaves execute their assigned work is re • * lu ^ . i u i iei a ^ • 

^ . r 11 I *u *u 1 ' \i » efficient than node 111 s access to disk bank 153. A node is 

granules m parallel with the other slaves, causmg the parent ... « « -^^j c_ *t- j . * j *i„ j . 

r . 1 . r . .1. r * 1 said to have an "afSnit/ for the data stored on the node s 

task to complete faster than if it were executed by a single j^^^ disks 

process. Each node may contain multiple processors and 

multiple concurrent processes. The process that divides RELATIONAL STORAGE 

parent tasks into work granules and distributes the work 20 

granules to processes on the various processing nodes is Relational databases store information in indexed tables 

referred to herein as the coordinator process or "master**. that arc organized into rows and columns. FIG. 2 illustrates 

a sample table for an example relational database. Each row 

MULTI-PROCESSING SYSTEMS as210 in the table 200reprcsents an individual record. Each 

Multi-processing computer systems typically faU into column 220 in the table r^resents a different kind of 

three categories: shared everything systems, shared^isk information or "attribute that is of mterest for all the 
systems, and shared-nothing systems. The constraints placed For example. Table Emp 200 stores einployee 

on the coordinator process during the work granule distri- which contam oohimns 220 that correspond to the 

bution process vary based on the type of multi-processing following attnbutes: employee name ^mpnajme) employee 
system involved In shared^verything systems, processes 30 uumber^mpno) social secimty number^ 

on aU processors have direct access to all dynamic, i.e., number (D^tno), Salary, job Utle (Tid^ date of cmpb^^ 

volatile, memory devices (hereinafter generally referred to !^ff (^t^^^^)' ^ ^1"™^ V^' 

as "memory") and to aU static, Le., persistent, memory ^ ^27, respectively. Each row 210 m the Table Emp 

devices (hereinafter generally referred to as "disks") in the 200 stores the same attributes for each mdividual employee, 
system ^ 35 one attribute per column 220, but the values of an attnbute 

* . , , stored in a column 220 may change. In this example, an 

In sharedndisk systems, processors and memones are employee record is uniquely identified by a social security 
grouped mto nodes. Each node in a shared^hsk system inay ^23, or an employee number, Empno, 

Itself constitute a sh^-eveiythmg system that mcludt^ ^j^^ 222. In tables where none of the attributes are 

multiple processors and maltq)le memones. Processes on aU ^ ^ ^^^^^^^ a ROWID may be gener- 

processors can access all di^ m the system, but on^y the ^ ^ ^ ^^^^^^ 

processes on prc^^rs that belong to a particular node can ^^^^ ^^^^ ^ ^^^^ ^ ^ converted 

directly access the memory withm the parUcular node. ^ ^^^^ ^ ^^^^ application program, which then 

In shared-nothii^ systems, all processors, memories and submits the queries to a database server. In response to the 

disks arc grouped into nodes. In shared-nothing systems as queries, the database server accesses the tables specified by 

in shared-disk systenos, each node may itself constitute a the query to determine which information within the tables 

sharcd-cvcrything system or a shared-disk system. Only the satisfies the queries. The information that satisfies the que- 

processes runnir^ on a particular node can directly access ^es is then retrieved by the database server and transmitted 

the memories and disks within the particular node. Of the to the database application and ultimately to the user, 
three general types of multi-processing systems, shared- ^ ^^^^^ ^^^^ a table in an 

nothmg systems typically require the least amount of wirmg ^^^^ form. As recordfe are entered into a table, they are 

between the vanous system components. ^^^^ ^^^^^^ ^^^^^ ^ ^ non-volatile, 

FIG. 1 illustrates an exampb shared-nothing muUiproces- j e., persistent, storage device, such as a fixed disk drive or 

sor system with four nodes 110 including three banks of optical device. Such a location can often be, relative to the 
disks 150. Disk banks 151, 152 and 153 are local to nodes 55 focation of the previous record, at a non-contiguous storage 

111, 112 and 113, re^ectively. A coordinator process, master sector of the persistent storage device. Over time, as records 

120 running on node 113, has spawned four slaves, slave W arc added or dropped, the physical arrangement of the data 

131, slave X 132, slave Y 133 and slave Z 134 running on in the persistent storage device usually does not corrcspond 

nodes 111, 112, 113 and 114, respectively. to the order of vahies in any of the attributes of the table. The 

ARFH nT^K/*;H ARFD NOTHTNfi ^ ^^^^ consecutive rows may appear to be randomly spread 

SHARED DISI^SR^ED NOTHING ^ ^^^^ ^^^^ ^ persistent storage device. 

Consequently, it is not always possible to directly access or 
Databases that run on multi-processing systems typically retrieve the record or range of records that satisfy a given set 
fall into two categories: shared-disk databases and shared- of search criteria. For example, in order to locate all rows in 
nothing databases. A shared-disk database expects all disks 65 a table that have a given value in a column A, every row of 
in the computer system to be visible to all processing nodes. the table must be fetched and column A of each row 
Consequently, a coordinator process in a shared-disk data- examined. Even when a row with the target value in column 
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A is found, the remainder rows in the table must be fetched used to store a leaf need not be filled by the index entries of 

and column A examined, unless the values in cohimn A are the leaf. For example, part of the block may be empty to 

established to be unique. accommodate later additions to the leaf. When the leaf does 

Another problem associated with data retrieval is that, not fill the block, the bits in the unused portion of the block 

typically, data for a particular row is stored in one or more 5 are simply ignored. When an attempt is made to insert into 

units of contiguous blocks in a persistent storage device. A a leaf more information than will fit on a block, the value 

block is the smallest quantity of data that can be read from ranges associated with the leaf and one or more other leaves 

a persistent store into dynamic memory. If a database system are revised, and additional leaves may be created, to redis- 

requires any information stored in a particular block, the tribute the index entries across the leaves in a way that 

database system must read the entire block into memory. To lo prevents any individual leaf from being overloaded. Each 

retrieve values for a target column of a table, the database leaf also contains a pointer or other link to the subsequent 

system must read the entire block or all the blodcs that have leaf. For example, leaf 310 points to leaf 320. 

any data firom that column of the table, rather than reading x^e other elements in the B-trcc hierarchy arc the 

only that portion of the block or blocks that conUin values branches. Branches contain information that indicates at 

from the target column of the table. Since values for the is least one range of values for the index key, and, for each 

target column may be present in all or almost all the blocks range, a pointer or other link to the next lower element of the 

of a table, the entire table or significant portion thereof must hierarchy, whether a leaf or another branch. The branch at 

be read into memory in order to retrieve the column values. the top of the hierarchy is called the root of the B-tree. By 

In such a case, the database server, in response to a query, foUowing the links from the root branch using the pointers 

performs a "full Uble scan" by fetching every row of the 20 associated with the range that includes the index key value 

table and examining the column or columns referenced in of interest, the leaf containing that value can be found. The 

the search criteria specified in the query. This retrieval can beginning of the ranges may be implied, and only the end of 

be very costly because, if the amount of the daU for the ranges need be specified in a branch, as in FIG. 3. For 

columns not used in the query is very large, then the full example, in FIG. 3, the root 302 indicates two ranges, the 

table scan methodology becomes very inefl&cient due to the 25 fijst for index key Deptno values from 0 to 20, and the 

unnecessary amount of disk input/output. second for values from 21 to 40. If a Deptno value of 11 is 

INDEXES interest, the database server would follow the link asso- 

j . , . V . . • r«- • ciated with the range from 0 to 20, which indicates branch 

Acconlingly, m one approach to imptovuig the efficiency 3^ 3^^^ ^^^^ ^ is from 0 to 11 

of data retnevaU databi^ systems provide mdexes to 30 ^i„,s.to » block of storage containing leaf node 310. 

increase the speed of the data retneval process. A daUbase ^.^J^ ^ ^ ^ block310 

mdex IS coDceptuaUy smiUar to a nonnal index found at the ^^^^^ ^ ^^j^^ ^ ^^^^ ^^^j^ ^ 

end of a book, m that both kinds of mdexes comprise an ^. j * a. .i. j **u-i * Auu^.T^xnn \ 

J J 1- . r * r ^ • J I r the sccoud entry from the end of the leaf. Although rIG. 3 

ordered list of information acoompamed with the location of j -^^ i r j * t * • j *^ c 

. , t. • 1 r * ui depicts leaf nodes that have separate mdex entries for every 

theinformation. Values in one or more columns of a table are -x^ / *u u u . ♦u^ „ » i, ' 

J , . J . - . - • . ■ J . 1 £_ .t_ row (even though some rows have the same mdex key 

stored in an index, which is maintained separately from the i \ * „ :ui t? « ™ i 1 

^ 1 J * u * ui /*u J 1. • i! . Li \ r« value), other arrangements are possible. For example, a 

achial database tabic (the underlying base table). The ^ ^ 

ordered hst of mformatioD in an index aUows for quick ^^^^ ^ ^y^^^^ by a list of 

scanmng to find a target value or range of values. Moreover, ..^ ^ .u -J *u • u *u** i« 

^ 1 ■ J . 1 .t. 1. row identifiers for the rows that have that key value, 
smce a conventional index stores only the values from one ^ 

or more columns that serve as the key to the index, a pointer DML OPERATIONS 
and a unique row identifier if necessary, the number of 

blocks of data being read into memory is significantly Data in indexed tables, sudi as Table Emp 200 in FIG. 2, 

reduced as compared to a full table scan. are manipulated with a set of conunands which can be called 

One structure that may be used for an index is a B-tree 45 Data Manipulation Language (DML) commands. Topically, 
strucmre. The logical layout of a sample B-tree 300 for Table the DML commands supported by database systems include, 
Emp 200 is illustrated in FIG. 3 for a case in which the key for example, commands to delete rows, insert rows, and 
to the index, i.e., the index key, is Depmo, column 224. A update rows. The i^)date row operation is often implemented 
B-trce index is a hierarchical arrangement of two types of as a delete row followed by an insert row. The delete row, 
elements: leaves and branches. Leaves reside at the towest 50 i°sert row, and update row operations are referred to here- 
level of the B-tree hierarchy and are associated with a range inafler as insert, delete, and update, 
of index key values. A leaf contains index entries for the In a multiple-Qode system, to make use of the multiple 
rows that have index key values in the key value range nodes, database servers have been designed to support 
associated with the leaf. Each entry in a leaf contains a key partitioned tables and parallel processing of DML com- 
value and a unique row identifier that is used to locate, 55 mands. In a partitioned table, the rows of the table are 
within the table, the row associated with the entry. grouped into distinct partitions that may be spread over 

For example, FIG. 3 shows four leaves 310, 320, 330 and multiple nodes. For example, FIG. 1 illustrates a database 

340 that collectively hold the index entries associated with table that is partitioned into three partitions. Partition A 161, 

index key values (department numbers) ranging from 10 lo Partition B 162 and Partition C 163 on the disk banks 151, 

40. Specifically, leaf 310 holds the index entries for the rows 60 152 and 153, respectively, local to nodes HI, 112, and 113, 

with index key (Deptno) vahies 10 and 11, Associated with respectively. 

each index key value 311 is an employee number, Empno, The division of rows into partitions is usually based on the 

from cohimn 222, which serves as the unique row identifier value of a table partition key defined by one or more 

312. In other approaches, the rowid itself serves as the columns of the table. For example, the Table Emp 200 can 

unique row identifier. 55 be divided among the three disk banks using employee 

In some database implementations, one B-tree leaf is number, Empno, in column 222 as the partition key. For 

stored per block of persistent storage. However, the block purposes oi illustration, Empno values from 1-100 may be 
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Stored in Partition A on disks 151 local to node 111, records the operation and, coosequenlly, the more eflScient the 

of employees 101 to 200 may be stored in Partition B on operation. Unfortunately, when the index key of an index is 

disks 152 local to node 112, and rows for employees 201 to not the partitioning key used distribute the work of a PDML 

300 may be stored in Partition C on disks 153 local to node operation, the changes that any given slave has to make to 

113. 5 the global index arc not likely to be clustered. For example. 

Each node can maintain an index for the rows in its own a particular slave may be assigned rows that corrc^ond to 

partition of the table. Such indexes, which only index the a particular range of Empno vahics. The rows that fall within 

rows that belong to a partition of a table and not the entire that range of Empno values may have Deptoo values that 

table, are referred to as local indexes. Al least one of the correspond to index entries that are randomly distributed 

nodes, e.g. node 113, may maintain a global index 170 for 10 within the leaves of index 200. Without clustering, the 

aUtherowsofthcEmptable.Aglobalindexisanindexthat benefits of caching are reduced. 

contains index entries for all rows of the table. With reject to contention, different slaves may have to 

contend for the same global index block at the same time. 

PARAIXEL DML OPERATIONS For example, different slaves may be updating table rows 

, „ , . . c that correspond to index entries in the same index block, and 

In aparaUelprocessmgsystem^DMLoperaUonsareoftcn ^^^^^^ ^j^^^ ^^^^ ^ ^^^^ 

divided mto woric granules which are spread across the ^^^^ ^ shared^verything system, memory is shared 
^^^^r.^^^r^^^^^'^ parallel exeaition. Such ParaUel block contention causes slaves to wait for each other to 
DML (PDML) opcratio^ may be used where the database ^^^^ ^ UtcWock manager to 
j^ffispartitioned orw^^ ^ make the block available to the slave for the desired opera- 
bulk, or both. Abulk DML operaUon is a set of related DML ^^^^ ^ ^^^^ ^^^^ requesting the block 
operations (hat affects a large numberof rows, such as when ^ the^latchAock on the 
the salary of every employee m a 40,000 employee company bjQck 

is raised a given amount in an employee database. * , 

* nT^»#T nT\%jn * On a shaTcd-di^c or shared-nothing system, block con- 

During PDML ope«UODS,t^ PDML master proce^. 25 ^ block pinging. On sich a system, the 

herein called the Coordmator Process. CP. distributes DML ^ Uie seoo^ node is granted only after 

operations among several slaves on ooiies^g nodes J g^, ^ 

based, at least partly on a key which is terem caUed a ,^ ,^ ^ 

paruuon keyjf the table invoked m the PDML operadon is ^i^^/^^^s the persistent storage device to retrieve the block 

staocally pamuoned, the partiUon key used to divide the 30 j„ ^^ ^^^^^ 

work granules of the PDML operation may be the same as c * j n ♦u „ *u i i _ ♦ ♦ .u^ ui«.,u 

. . , J. .1. . t_i >f . i_i • . first node will then request the lock manager to get the block 

the key that was used to partiUon the ^ble If the table is not ^^^^ ^^^^ ^ ^^^^ 

partiuoned, Uicn the partihon key refers to the oohiinns, if ^^^^^ ^^^^ ^^^^^ 

any,uscdtodividethcPDMLworkgranulesfordistnbution ^^^^^ ^^^^ ^^^^^^^ ^^^^^^^ l^^l^ 

among slaves. 35 njanager wfll allow the first node to obtain a lode on the 

UPDATING INDEXES block. The same block can thus be shuttled back and forth 

between the persistent storage device and the nodes that are 

An index must also be updated to reflect changes to the executing the alternating two (or more) contending slaves, 

table upon which it is built when (1) data in the one or more Further, it is possible that none of the contending slaves are 

columns serving as an index key of the index are changed, ^ on the node for which the persistent storage device is local, 

or (2) data in the one or more columns serving as the unique If none are local, transmissions and extra overhead to read 

row identifier used by the index are changed. For example, or write to a noQ-local disk are also required, consuming 

index 300 would have to be revised if values in the Deptno even more system resources. 

column 224 of Uble 200 are changed. Por example, when the persistent storage device is a disk. 

When the operation being performed on a table is a each block ping implies two disk input/output (I/O) 

PDML operation, each slave involved in the PDML opera- operations, one flu^ to the disk from the cache of the first 

tion is responsible for modifying the global index to reflect node, and one read back from the disk into the cadie for the 

the changes made by the slave to its assigned partition of the second node. Block pinging is worse than the delays expe- 

table. However, if the key upon whidi the global index is rienced by a shared-everything system because block ping^ 

built is not the same as the partition key used divide the not only delay results but also consume system I/O 

PDML operation, then maintaining global index becomes a resources. These block pings make the PDML with a global 

bottleneck. For example, updating index 300 may become a index typically worse for shared-noUiing and shared disk 

bottleneck in a PDML operation perfbnned on table 200 if systems than for a shared everything system. 

Deptno is not the partitioning key used to distribute the work por example, assume that table 200, partitioned by Empno 

granules of the PDML operation. as shown in FIG. 1, requires a bulk update setting Deptno= 

Specifically, under these conditions, maintaining a global Deptno-t-1 for all records. To perform the update, the master 

index during a PDML operation results in a loss of process, CP, forms three work granules to distribute to three 

clustering, contention for resource locks, and block pinging. PDML slaves. Slave W 131 on node IH updates Partition A 

The proportional gains expected from parallel operations, ^ that contains rows with Empno values firom 1 to 100, slave 

e.g., halving the duration of operations when the number of X 132 on node 112 updates Partition B that contains rows 

processors used is doubled, are not observed. The observed with Empno values from 101 to 200, and slave Y 133 on 

gains are less, i.e., scaleable parallelism is not attained. node 113 updates Partition C that contains rows with Empno 

Qustering refers to the physical nearness to each other of values from 201 to 300. The same slaves will update the 

the data items upon which a process operates. In general, the 65 local indexes associated with their partitions, 

more closely dustered the data items that are accessed If a global index 170 maintained on disk bank 152 of node 

during an operation, the fewer the disk accesses required by 112 uses Deptno as an index key, that index also must be 
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Updated because the values of Deptno have been changed. In another a^ect of the invention, techniques for main- 
Using conventional PDML techniques, slave W 131 updates taining a global index of a table during parallel data manipu- 
thc global index 170 for employees 1 to 100, slave X 132 lalion operations involve a coordinator process, data 
updates the global index 170 for employees 101 to 200, and manipulation slaves and index update slaves. The coordina- 
slave Y updates global index 170 for employees 201-300. 5 tor process distributes data man^julatioo operations among 
FIG. 3 shows that to update Deptoo for employee 100, slave a plurality of data manipulation slaves. Each data manipu- 
W 131 must read leaf block 310; and, to update employee lation slave performs data manipulation operations on rows 
101, a different slave, slave X 132, also must update leaf of the table and sends an index maintenance record contain- 
block 310. Because slave W 131 does not know about the ing an index key value to the coordinator process. The 
update being made by slave X 132, the two updates are not 10 coordinator process receives index maintenance records 
coordinated and two separate reads of block 310 into cache from the plurality of data manipulation slaves, computes a 
and two separate flushes from cache occur. The benefits of plurality of index key value ranges, and sends each range to 
caching are lost. a respective index update slave. Each index update slave 
On a shared-nothing system, as in FIG. 1, after slave X reads a index key value from a current index maintenance 
132 has completed the update of Deptno for Empno "101 " is record, and updates the global index using the current index 
and before slave X 132 has updated Deptno for Empno maintenance record if the index key value falls within the 
"102," the block may be requested by slave W on node HI. range of values assigned to the index update slave. 
Then node 112 flukes the blodc from its cache and writes in another aspect of the invention, techniques for using a 
the block to disk. Then node 111 causes a read of the block global index includes performing a data query using the 
from disk and trananission of the block to node 111 where 20 global index to provide a query result. If an out-of-date flag 
it is placed in the cache of node 111; and the Deptno value indicates the global index is not current, then a set of sorted 
for Empno "1" is updated by slave W 131. But before slave index maintenance records, each record containing a value 
W 131 can update Deptno for Empno "2," slave X may have of an index key for the global index and an index operation, 
requested block 310 for updating Deptoo for Empno "102." jg searched for a match to the index key values used in the 
Consequently, node 111 flushes the block from its cache, 25 data query. The query result is modified according to the 
trarismits it to node 112 which stores it on disk 152, then index operation, 
node 112 reads the block from disk into the cache on node 

112 for slave X 132. Slave X updates Deptno for Empno DESCRIPTION OF THE DRAWINGS 

"102." Slave W may then request the block again and node p^^^t invention is illustrated by way of example, 

112 again flushes the block from cache to disk 152 so node 30 ^nd not by way of limitation, in the figures of the accom- 

111 can access it. Thus block 310 pings repeatedly from panying drawings and in which like reference numerals refer 

cache to disk to cache, heavily consuming I/O resources in ^ ^^^^^ elements and in which: 

the process. j is a block diagram of a database running on a 

nie attd to achieve scaleable paraUelism thjis tra^lat« multiprocessor system according to a conventional arrange- 

to a need to update a global mdex as a result of PDML 35 j^^j^f 

operations without suffering the deficiencies of lost r-i^-*- t. .-^l- r 

r . . * r *i. t.1 1 *i„ 1 ** FIG- 2 IS a schematic diagram of an example database 

clustcnng, or contention for the same block, the latter ^ ^ ^ t j . t.* iT v j- 

, * 1 - • table Emp for an employee database upon which an embodi- 

leading to excessive waits or to block pmging. ^ eft. - ^- * 

^ r o o jjjgjjj mvention may operate. 

SUMMARY OF THE INVENTION ^ FIG. 3 is a schematic diagram of an example B-trec index 
Techniques are provided for coordinating an update of a for the example table Emp of FIG. 2, upon which an 
global index of an indexed table. According to one embodiment of the invention may operate, 
technique, a coordinator process receives index maintenance FIG. 4 is a block diagram of a computer system upon 
records from a plurality of data manipulation slaves for the which an embodiment of the invention may be implemented, 
indexed table. Each index maintenance record iochides a 45 pjc^ 5 ^ flowchart of a coordinator process according 
value for an index key of a gtobal index of tiie table. The ^ an embodiment of the present inventioa 
coordinator piocess computes a plurality of ranges of index f,g. 6 is a schematic diagram of an example set of index 
key values, and ^igr« to each of a plurality of mdex update maintenance records for coordinating index updates accord- 
slaves those records that faU in each range of the ph^^^ ^ embodiment of the present invention, 
ranges. Each mdex update slave updates the global index 50 r^^^- 

bas^d on the records ^cd to it HG. 7 is a bloA diagram of a databa« nmnmg on a 

- 7^ . . . , J multiprocessor system according to an embodiment of the 

In another aspea of the mvention, each slave reads an present invention 

index key value from a current index maintenance record, ^ _ . . _ u j * i- *u * ma r mr- c 

and updates the global index using the current index main- ^ * flowchart detailing the step 530 of FIG. 5 

tcnance record if tiic index key value falls within the range 55 to another embodiment of the present mvention. 

of values assigned to the slave. FIG. 8B is a flowchart detaiHi^ the steps 520 and 530 of 

In another aspect of the invention, techniques are pro- FIG. 5 according to an aUemativc embodiment of the present 

vided in which an index update distribution table is sent mvention. 

from a coordinator process to an index update slave process. FIG. 9 is a schematic diagram indicating the index entries 

The table maps index key value ranges to slave identifica- 60 affected among three example global indexes for one 

tions. The index update slave reads an index key value from example database, and ranges according to another embodi- 

a current index maintenance record, and locates the range in ment of the present invention. 

the index update distribution table that encompasses the Table 10 is a flowchart detailing the step 540 of FIG. 5 

value read. If the slave identification associated with the according to another embodiment of the present invention, 

located range corresponds to the identification of the slave, 6S RG. 11 is a schematic diagram of an example index 

the slave updates the global index using the current index update distribution table for coordinating index updates 

maintenance record. according to another aspect of the present invention. 
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FIG. 12A is a flowchart of a slave process according to an 
embodiment of the present invention. 

FIG. 12B is a flowchart detailing the step 1230 of FIG. 
12A according to another embodiment of the present inven- 
tion. 5 

FIG. 13 is a flowchart of a query process using a global 
index according to another aspect of the present invention. 

DETAILED DESCRIPTION OF THE 

PREFERRED EMBODIMENT lo 

A method, product and system for parallel global index 
maintenance is described. In the following descriptbn, for 
the purposes of explanation, numerous specific details are 
set forth in order to provide a thorough understanding of the 
present invention. It will be apparent, however^ to one 
skilled in the art that the present invention may be practiced 
without these specific details. In other instances, well-known 
structures and devices are shown in block diagram form in 
order to avoid uimecessarily obscuring the present inven- ^ 
tion. 

HARDWARE OVERVIEW 

FIG. 4 is a block diagram that illustrates a computer 
system 400 upon which an embodiment of the invention 25 
may be implemented. Computer system 400 includes a bus 
402 or other communication mechanism for communicating 
information, and a processor 404 coupled with bus 402 for 
processing information. Computer system 400 also includes 
a main memory 406, such as a random access memory 30 
(RAM) or other dynamic storage device, coupled to btis 402 
for storing information and instructions to be executed by 
processor 404. Main memory 406 also may be used for 
storing temporary variables or other intermediate informa- 
tion during execution of instructions to be executed by 35 
processor 404. Cbmputer system 400 further includes a read 
only memory (ROM) 408 or other static storage device 
coupled to bus 402 for storing static infiiimation and instruc- 
tions for processor 404. A storage device 410, such as a 
magnetic disk or optical disk, is provided and coupled to bus 4Q 
402 for storing information and instructions. 

Computer system 400 may be coupled via bus 402 to a 
display 412, such as a cathode ray tube (CRT), for displaying 
information to a computer user. An input device 414, includ- 
ing alphanumeric and other keys, is coupled to bus 402 for 4s 
communicating information and command selections to 
processor 404. Another type of user input device is cursor 
control 416, such as a mouse, a trackball, or cursor direction 
keys for communicating direction information and com- 
mand selections to processor 404 and for controlling cursor 50 
movement on display 412. This input device typically has 
two degrees of freedom in two axes, a first axis (e.g., x) and 
a second axis (e.g., y), that allows the device to specify 
positions in a plane. 

TTie invention is related to the use of one or more 5S 
computer systems 400 for parallel global index mainte- 
nance. According to various embodiments of the invention, 
a master process or a slave process is provided by computer 
system 400 in response to processor 404 executing one or 
more sequences of one or more instructions contained in 60 
main memory 406. Such instructions may be read into main 
memory 406 from another computer-readable mediims, such 
as storage device 410. Execution of the sequences of instruc- 
tions contained in main memory 406 causes processor 404 
to perform the process steps described herein. In alternative 65 
embodiments, hard-wired circuitry may be used in place of 
or in combination with software instructions to implement 
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the invention. Thus, embodiments of the invention are not 
limited to any specific combination of hardware circuitry 
and software. 

The term "computer-readable medium" as used herein 
refers to any medium that participates in providing infor- 
mation or instructions to processor 404 for execution. Such 
a medium may take many forms, including but not limited 
to, non-volatile media, volatile media, and transmission 
media. Non-volatile media includes, for example, optical or 
magnetic disks, such as storage device 410. Volatile media 
includes dynamic memory, such as main memory 406. 
Transmission media includes coaxial cables, copper wire 
and fiber optics, including the wires that comprise bus 402. 
Transmission media can also take the form of acoustic or 
electromagnetic waves, such as those generated during radio 
wave and infrared data communications. 

Common forms of computer-readable media include, for 
example, a floppy disk, a flexible disk, hard disk, magnetic 
tape, or any other magnetic medium, a CD-ROM, any other 
optical medium, punchcards, papertape, any other physical 
medium with patterns of holes, a RAM, a PROM, and 
EPROM, a FLASH-EPROM, any other memory chip or 
cartridge, a carrier wave as described hereinafter, or any 
other medium from which a computer can read. 

Various forms of computer readable media may be 
involved in carrying one or more sequences of one or more 
instructions to processor 404 for execution. For example, the 
instructions may initially be carried on a magnetic disk of a 
remote computer. The remote computer can load the instruc- 
tions into its dynamic memory and send the instructions over 
a telephone Line using a modem. A modem local to computer 
system 400 can receive the data on the telephone line and 
use an infrared transmitter to convert the data to an in&ared 
signal. An infrared detector can receive the data carried in 
the infrared signal and appropriate circuitry can place the 
data on bus 402. Bus 402 carries the data to main memory 
406, from which processor 404 retrieves and executes the 
instructions. The iitstructions received by main memory 406 
may optionally be stored on storage device 410 either before 
or after execution by processor 404. 

Con[q)uter system 400 also includes a communication 
interface 418 coupled to bus 402. Communication interface 
418 provides a two-way data conununication coupling to a 
network link 420 that is coimected to a local network 422. 
For example, communication . interface. 418 may be an 
integrated services digital network (ISDN) card or a modem 
to provide a data communication connection to a corre- 
sponding type of telephone line. As another example, com- 
munication interface 418 may be a local area network 
(LAN) card to provide a data communication cormection to 
a compatible LAN. Wireless links may also be implemented. 
In any such implementation, communication interface 418 
sends and receives electrical, electromagnetic or optical 
signals that carry digital data streams representing various 
types of information. 

Network link 420 typically provides data communication 
through one or more networks to other data devices. For 
example, network link 420 may provide a connection 
through local network 422 to a host computer 424 or to data 
equipment operated by an Internet Service Provider (ISP) 
426. ISP 426 in turn provides data communication services 
through the world wide packet data commimication network 
now commonly referred to as the "Internet" 428. Local 
network 422 and Internet 428 both use electrical, electro- 
magnetic or optical signals that carry digital data streams. 
The signals through the various networks and the signals on 
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.network link 420 and through communication interface 418, each available node. This way the slaves may be provided 

which carry the digital data to and from computer system with non-overlapping ranges, for example, in whicb each 

400, are exemplary forms of carrier waves transporting the index update slave can woric without danger of substantial 

information contention for the same leaf block of the index. Furthermore , 

Computer system 400 can send messages aixl receive 5 each index uptoe slave may sort M 

, ^- J ;i. u *u w \ * the values of the index key withm the range, to mcrease 

data, mchidmg program code, through the networkfs), net- . . . i ^ * r t.- ■ 

1 1- 1 J • • . _f A^o T clustcrmg and thus take advantage of caching, 

work Imk 420 and communication mterface 418. In the ^ & -& 

Internet example, a server 430 might transmit a requested INDEX MAINTENANCE RECORDS 
code or message for an application program through Internet ^^gn ^ ^^^^^ performs a DML operation on rows that 
428, ISP 426, local network 422 and communication inter- 10 ^^^^^^ ^ indexed table, the slave generates index main- 
face 418. In accordance with the invention, one such down- tenancc records that indicate how the indexes of the table 
loaded application provides for an index update slave and an ^ changed to reflect the changes made by the slave. As 
index update distribution table, as described herein. mentioned above, the modified CP receives index 
The received code may be executed by processor 404 as maintenance records from the PDML slaves in step 520. 
it is received, and/or stored in storage device 410, or other FIG. 6 shows a set of index maintenance records 630 
non- volatile storage for later execution. In this manner, according to one embodiment of the invention. In FIG. 6, 
computer system 400 may obtain application code in the each index maintenance record includes an index ID in the 
form of a carrier wave. first field 611, a key value in the second field 612, and a data 

identification value (Data ID) in the third field 613. 

FUNCTIONAL OVERVIEW ^ ^^^^ ^ ^^^^ .^^^^^^^ ^^^^j^^ 

To achieve scaleable parallelism during execution of a needs to be changed based on the index maintenance record. 

DML operation on an indexed table, the present invention For example, index maintenance records 621 to 626 indicate 

makes disjoint (1) the data sets operated on by the slave changes that need to be made to the index associated with 

processes during the table manipulation phase of the DML 25 ^^^^^ ID=1. Index maintenance records 631 and 633 indi- 

operation, and (2) the data sets operated on by the slave cate changes that need to be made to the index associated 

processes during the global index maintenance phase of the with Index ID-2. 

DML operation. According to one embodiment, this is The Key value of an index maintenance record indicates 
accomplished by adding functionality to the coordinator the index key value of the row that corresponds to the index 
process (CP) used to coordinate the parallel data manipula- maintenance record for the index identified in the Index ID 
tion operations. In some embodiments, functions are also value stored in the index maintenance record. For example, 
added to the slaves running 00 the parallel processors. The index maintenance record 621 corresponds to a row where 
CP is modified according to one embodiment of the present Deptno^lO. The index indicated in the Index ID colunm of 
invention to redistribute the index updates among the avail- index maintenance record 621 is built on the Deptno col- 
able slaves in a manner that (1) avoids contention between 35 uma Hence, index maintenance record 621 stores the Key 
the slaves for the same index block, and (2) increases the value 10. 

clustering of the index updates for which each slave is made The Data ID value of an index maintenance record is used 

responsible. In another embodiment, index update slaves to uniquely identify the row in the indexed table that 

may sort data indicating index updates to increase the corresponds to the index maintenance record. For example, 

clustering. 4q an index maintenance record that is generated in response to 

FIG. 5 is a flowchart of a coardinator process (CP) the insertion of a row X would have a Data ID value that 

according to one a^ct of the present invention. The modi- uniquely identifies Row X. To uniquely identify the row that 

ficd CP distributes data manipulation operations among corre^nds to an index maintenance record, the Data ID 

parallel data manipulation slaves (step 510), as in the field of the record contains the contents of one or more 

conventional CP. When a. PDML slave manipulates data in 45 columns, or the ROWID, that uniquely determines the 

the database that affects an index key, the PDML slave corre^ndirg row of the index maintenance record. The 

generates an index maintenance record specifying (1) the value in the Data ID field of an index maintenance record 

index key value affected and (2) the operation to perfonn on that corresponds to a row is also included in the index entry 

the index, as will be described below in more detail. By for that row, and therefore may be used to identify the index 

writing index maintenance records, the index update can be 50 entry for the row when the index key value of the row is not 

delayed to a separate phase following completion of the data unique. If the index key upon which an index is built is a 

manipulation phase. Consequently, the PDML slaves which database colunm that is subject to a uniqueness constraint, 

generated the index maintenance records do not have to then the index key value of an index maintenance record for 

apply the changes to the index. Instead, the CP receives that index imiquely identifies the row. Hence, 00 separate 

some or all of the index maintenance records from all the 55 Data ID value is necessary to uniquely identify the row, and 

PDMLslaves(step520), for example as messages transmit- the contents of the Data ID field may be the null value 

ted over the network. In this maimer the CP becomes the (NULL). 

only process that is informed of the extent to which index Each index maintenance record corresponds to a row. The 

keys must be updated. This information is used by the CP to row to which an index maintenance record corresponds is 

redistribute the index updates among the available nodes. 50 the row whose manipidatioo caused the generation of the 

As shown in FIG. 5, the CP next computes a plurality of index maintenance record. Index maintenance records indi- 

index key value ranges in step 530. One range is computed cate changes that need to be made to indexes in response to 

for each node available to perform processing on the index. changes that are made to tables. 

This may be the same as, more than, or less than the number The index maintenance records therefore also include an 

of nodes available for performing the table manipulation 65 Opcode field 614 which indicates the operation that must be 

phase of the PDML operation. The CP distributes the ranges performed on the index entry identified by Key value and 

to a set of index update slaves in step 540, one such slave on Data ID. 
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For example, suppose the database table Emp 200 illus- range is from 20 to 29; and the third range is from 30 to 39. 

trated in FIG. 2 is indexed by the Deptno column 224. Other procedures to generate the ranges are possible and 

Deptno 224 is not unique, so employee number, Empno in desirable as will be descnbed in more detail later. According 

column 222, is used as the unique row identifier. Then the to step 540, the modified CP would assign each of the three 

bulk data manipulation command to increment the depart- 5 ranges to a respective one of three index update slaves. Thus, 

mcnt number by 1, i.e., iiK:rement the contents of Deptno for example, the range 10-19 is assigned to index update 

224 by 1, will change an index key value in every row of the slave 1, the range 20-29 is assigned to index slave 2 and the 

database table Emp 200. For example, Fred (Empno "1") in range 30-39 is assigned to index update slave 3. After 

row 211 has his Deptno changed from "10** to "11." assigning a range to an index update slave, the CP distributes 

Consequently, many index maintenance records will be 10 to the slave all index maintenance records that have key 

generated. FIG. 6 illustrates six of the index maintenance values that fall within that range. Other ways to distribute 

records 621 to 626 that might be produced in response to this the index maintenance records are also possible and desir- 

operation. The change of Deptno index values associated able as will be described below in more detail, 
with Fred is indicated by a "'delete" Opcode where the Key 

value is "10" and Empno is "1" (see record 621 of HG. 6), 15 UPDATING GLOBAL INDEXES 

and by an "insert" Opcode where the Key value is "11" and upon receiving index maintenance records, the index 

Empno is "1" (see record 624). update slaves update the global index using index mainte- 

The Key value. Data ID and Opcode fields, 612, 613 and nance records in which the key value faUs within the range 

614, re^ctively, are included in conventional index main- assigned to each slave. This process is described in more 

tenance records used by data manipulation slaves. Accord- detail below. Because each sXavc has a non-overlapping 

ing to one aspect of the present invention, the index main- rangp, there is little or no contention for the same leaf 

tenance records are modified to include the index block — thus latch contention and block pinging are drasti- 

identification (Index ID) so that several global indexes on cally reduced. If each index update slave processes the index 

the same database can be maintained with the same set of maintenance records in order of increasing or decreasing key 

index maintenance records. ^5 value, then the updates to the index will be clustered and the 

For example, suppose that Table Emp ilhistrated in FIG. advantages of caching will also accrue to the process. In this 

2 is indexed separately by two columns, Deptno 224 and way, many of the disadvantages of conventional global 

Salary 225. Then, a data manipulation command that deletes index updating operations are overcome. 

aU n^cords from the database where the em^^^ EXEMPLARY PDML OPERAHON 
is November 1 of any year, will cause Betty and Ahce, rows 

213 and 216 in FIG. 2, respectively, to be deleted from the Referring to FIG. 7, the flexibility of the PDML tech- 
database table Emp 200. The required updates to maintain niques provided herein are described. FIG. 7 illustrates a 
theindexby the deletion of Betty includes deleting a Deptno shared-nothing multiprocessor system according to one 
of "12** (the value in Deptno after the bulk increment of embodiment of the present invention. In the system, a 
Deptno described above) in one global index, and deleting partitioned database table made up of partitions A 161, B 
a salary of $45,000 in the other index. Tb keep separate the 162 and C 163 is stored on nodes HI, 112 and 113, 
changes to the two indexes, the index maintenance records respectively. A global index 170 for the table is located on 
arc modified to include the Index ID 611. In FIG. 6 an Index node 112. 

ID of"!" indicates the Deptno index and an Index ID of "2" ^ ^ coordinator process (CP 750) executing on one of the 

indicates a Salary index. Thus changes to both global Qojjgs manages execution of a PDML operation performed 

indexes can be recorded in the same set of index mainte- t^ble. Slaves 731, 732, 733 and 734 are shown 

nance records 630. Additional details concerning the use of running on eadi of four processing nodes 111, 112, 113 and 

the modified index maintenance records are described later jj^4^ respectively. In the illustrated embodiment, the same 

^^ow, slaves 731, 732, 733 and 734 are used to perform both the 

data manipuilation operations during the data manipulation 
phase of a PDML operation and the global index updates 
during the glc^al index update phase of the PDML opera- 

As indicated at step 540 of FIG. 5, the CP distributes the tion. To that end, the slaves arc modified to include index 

work associated with index maintenance records to a plu- 50 update logic 741, 742, 743 and 744. Alternatively, the global 

rality of index update slaves. However, this distribution is index update phase of a PDML operation may be performed 

not made based on the same partitioning criteria that was ^>y a different set of slaves than is used to perform the data 

used to distribute the work granules during the table update manipulation phase of the PDML operation, 

phase of the PDML operation. Rather, the CP distributes the The index maintenance records 711, 712, 713 and 714 

index maintenance work granules based on the values in the 55 represent the records generated by each data manipulation 

Key value column of the index maintenance records. slave. These records are sent from the slaves that generated 

According to one embodiment, the CP determines an tbem to CP 750. Records 710 represent the set of index 

overaU range of index key values for each index from these maintenance records 710 received by the CP from the data 

records. Assume, for example, that the complete Table Emp manipulation slaves 731, 732, 733 and 734. As described 

200 has department numbers that vary from "10" to "39," so 60 above, these records 710 may inchide an Index ID to 

that the overall range of key values then would be 10 to 39. distinguish among more than one global index. 

Thetaskof modifying the index for this overall range of key When the CP distributes the data manipulation work 

values is split among several nodes. If three nodes are granules,themmiberof PDML slaves utilized may vary. For 

available, three slaves divide the task — each slave taking, example, if the PDML operations are partitioned the same 

for example, ten values (one third of the overall range). 65 way as the database table, then only three PDML slaves will 

Therefore the CP wouki define three ranges at step 530 of be used, those on node HI, 112 and 113, respectively, 

FIG. 5, where the first range is from 10 to 1^, the second because these arc the nodes with partitions of the table. Node 
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114 is not used. Alternatively, to make the most use of 
available processors, some of the PDML operations may be 
assigned to node 114. Because node 114 has less affinity for 
the rows of the database tables than the other three nodes, 
node 114 may be assigned a work granule with fewer than 
one fourth the total DML operations, according to various 
schemes for distributing tasks known in the art. 

As mentioned above, the number of slaves used during the 
data manipulation phase of a PDML operation need not 
equal the number of slaves used during the index update 
phase of the PDML operation. Thus, even if the DML 
operations are confined to nodes 111, 112 and 113, all four 
nodes may still be utilized for the index update slaves. Only 
node 112 has a strong affinity for the global index in the 
exanrple illustrated in FIG. 7. Thus, node 114 is no less 
qualified to update the global index than are nodes 111 or 
113. Consequently, even if only three PDMLslaves are used, 
the CP can choose to divide the index update task among 
four index update slaves, with index update slave 742 on 
node 112 perhaps receiving a greater share of the load 
because of its greater affinity for the disk on which the global 
index resides, again, allocating shares according to various 
schemes for distributing tasks known in the art. 

DISTRIBUTING WORK BASED ON INDEX 
KEY VALUE RANGES 

FIG. 8A is a flowchart detailing the step 530 of FIG. 5 
according to an embodiment of the present invention. 
According to this embodiment, instead of assuming an equal 
distribution of index key values between a minimum and 
maximum value, the actual distribution of key values is 
examined to achieve better loading among the index update 
slaves. 

First, a sample of S records of the index maintenance 
records is obtained 810. The number S can be all the records 
produced during the PDML operations. That is, the entire 
population of records can be sampled. Alternatively, a 
predefined large quantity that is expected to give good 
statistical representation of the population can be chosen for 
S, such as 1000 records. In an alternative embodiment, a 
constant percentage of the index maintenance records is 
utilized as the sample, e.g., every fiftieth record is sampled 
during the PDML operations. Next, the number of available 
nodes is determined to establish the number N of index 
slaves among which to divide the task of updating the index 
820. N is then the degree of parallelism for the index update. 

The number S* of index maintenance records belonging 
to a particular global index is determined in preparation for 
sorting. If there is only one global index, as with the 
conventioEial index maintenance records, then S* is equal to 
S. If there arc multiple global indexes, then there arc 
multiple values of Index IDs in the modified index mainte- 
nance records and there are two choices for S*. One choice 
is that the Index ID is added as the most significant digits to 
the Key values and all records are sorted together. In this 
case, again S* is equal to S. As will be shown later, it is 
preferable to keep different index IDs separate from each 
other and sort only within one Index ID. In such a case, S* 
would be less than S. For example, if both Depmo and 
Salary are global indexes and many table rows were deleted, 
then about half of the index maintenance records have an 
index ID for Dcptno and the other half have an index ID for 
Salary (see records 631 to 634 in FIG. 6). In this case S* is 
about S/2. The S* records are then sorted by key index value. 

The ranges are then defined by reading the key values 
associated with each multiple of S*/N from the sorted 
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records in step 840. The begiiming of the first range is 
. assumed to be zero, or any value known to be less than the 
minimum value for an index key, e.g. -9999. The end of the 
first range is the value at the S*/N location,, rounded to the 

5 next Mualler or larger integer For example, if N, the degree 
of parallelism, is 3 and S* is 299, then S*/N is 99 or 100. If 
99 is chosen, then the end of the first range is the value of 
the key index in the 99** position of the sorted S* samples. 
The computation may be repeated for every index slave as 

10 in step 850. For example, the end of the second range is at 
the position defined by 2 (S*/3) which may be computed as 
598/3ol99, i.e. the value of the key index at the 199^ 
position (2*99 =-♦198''' position would also be acceptable). 
The end of the last range would be the last sorted position 

15 (=-»3* 299 /3) or some number known to be larger than the 
largest value of the key index, such as 10,000,000,000. 
Assuming, far example, that there are many more employees 
in departments 10 and 11 than in the rest of the departments 
combined, then the 99''* position may hold the Deptno value , 

20 "10", the 199^ position may hold the Deptno value "12" and 
the last position would hold the Deptno value "40**. The 
three ranges would then be 0-10, 11-12, and 13-40 (or 
13-10,000,000,000). The three ranges derived from the 
samples would give a more even loading of the three index 

25 update slaves than the uniform distribution of 10-19, 20-29 
and 30-39 given above. The uniform distribution would put 
updates of index key values 10-19 on the first index slave, 
which would include more than half of all index mainte- 
nance records in this example. 

^ If there are multiple global indexes, then the system 
returns to step 830, sorting on the index key values of the 
next Index ID, until a set of N ranges has been defined for 
every index ID as checked in step 860. 

In a preferred embodiment, illustrated in FIG. 8B, the 
PDMLslaves do the sorting and sampling before sending the 
index maintenance records to the CP. For example, each 
PDML slave maintains a buffer for each global index 
affected by the manipulation of data in the table. During a 
Phase I, as each DML operation is performed by a PDML 

^ slave, an index maintenance record for each affected global 
index is generated and placed in the buffer devoted to that 
index. For example, three PDMLslaves affecting two global 
indexes would produce up to six buffers, two private buffers 
on each of the three nodes running PDMLdaves, each buffer 
on the same node for a different global index. 

As modified according to this embodiment, after all 
PDML operations are complete, i.e., after Phase I and during 
a Phase II, the PDML slave sorts the index maintenance 
records in eadi buffer. After sorting each buffer, the modified 
PDML slave selects a sample from each buffer according to 
one of the sampling methods, for example, selecting every 
Mth record in the buffer, where M is a constant integer, such 
as 100. The modified PDML slave then sends the samples to 

55 the CP. By having the modified PDML slaves do the 
sampling, traffic between nodes is reduced, saving system 
resources. 

In this embodiment, tbe sampling by the PDML slave 
replaces the need for sampling shown in step 810 of FIG. 
50 8A. Also, the separate buffers of the PDML slaves and the 
CP replace the need for subsampling shown in step 830 of 
HG. 8A. 

In this preferred embodiment, the maintains a private 
buffer for each global index and performs tbe steps illus- 
65 trated in FIG. 8B. In step 812, the CP inserts the records 
received from each modified PDML slave into the appro- 
priate buffer based on the global index of the index main- 
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lenance record. The CP inserts each record into each buffer update operations. For example, since the second index 

to maintain an order by index key value. After all records update slave 742 (in FIG. 7) has a greater affinity for the 

from all PDML slaves have been inserted into the buffers, global index 170 (in FIG. 7) at its own node 112, its range 

the ranges are then determined by stepping through the can be increased for that global index. Assuming the global 

sorted buffer in step 822, where the current buffer has S* 5 index 170 is the first global index 910, then the second range 

records. In step 832 the degree of parallelism for the update 912 for the second iiidex update slave 742 can be increased, 

is determined by the number of nodes, K, to be used for the for example by including a fourth change in the range, 

index update. For example, N is 4 in the system of FIG. 7. Accounting for node aflSnity in defining the ranges is 

As in other embodiments, the number N of index update reflected in step 840 in FIG. 8A and step 842 in FIG. 8B. 

slaves can be different than the number of PDML slaves. In Qne alternative allowed by the present invention is for 

step 842, the end of the index key value range for one of the assigning a separate index to each slave, e.g., to assign index 

index update slaves is then obtained by taking the record at ^ ^^^^ ^ 742^ j^dex 2. 920, to slave 2, 742, and 

the position of the next multiple of S'/N in the current CP ^^^^ 3 5,30 ^ ^^^^^ 3 743 -^us would be accomplished 

buffer for one of the global mdexes^ In ^ep 852, control ^^^^ ^^^^ significant digits to the 

returns to step 842 untd ranges are defined for all Nmdex ^ ^ 

update slaves. In step 862, control returns to step 822, to -^^ ^ , , u -m. . ^ n 1 

select the next buffe^until buffers for all the global indexes ^^P^^' ^ ^^T^^ procedure will also 

have been used. Tlie results of computing r^iiges for three W^ar to even ±c load if one mdex has no more changes 

global indexes are shown in FIG. 9. HG. 9 also ilhistratcs However, uneven loads on the index 

why sorting each global index separately is preferable. ^^P^ate slaves can persist with this method. As shown in FIG. 

FIG. 9 is a schematic diagram indicating the index entries 20 the changes to index 2 are chistered while the changes to 

affected among three example global indexes for one i°dex 1 arc not. This places a greater load on the slave 

example database. In FIG. 9 each rectangle represents a updating index 1. To see this, assume that a leaf block is 

separate global index with the horizontal position represent- filled by three entries in each of the three indexes in HG. 9. 

ing relative value of an index key. Three global indexes are Then the slave updating the first index 910 will have to read 

depicted. Each circle inside a rectangle r^resents an entry 25 make changes in every leaf block in the index, a total of 

in the index leaf blocks. Adjacent circles are likely to be in 16 disk operations (8 disk reads and 8 disk writes) in this 

the same leaf block, and separated circles are likely in example. On the other hand, the slave updating the second 

different leaf blocks. An open circle represents an index index 920 would have to read and write only four leaf 

entry that is not changed by any of the index maintenance blocks, 8 disk operations, because these changes are more 

records, while a solid fiUed circle represent an index entry 3Q clustered making better use of the node's cache. Therefore 

that is changed by at least one record of the index mainte- the load on the slave updating index 2, 920, is much less than 

nance records. the load on the slave updating index 1, 910. More even 

In the example of FIG. 9, ten index entries are changed in distribution of the load is achieved, therefore, if each slave 

each of the three global indexes, as might occur with 30 updates a range in each index; and this is the preferred 

index maintenance records without any repeated triplet of 35 embodiment. 

Index ID and Key value and Data ID. The ten changed FIG. 10 is a flowchart detailing the step for distributing 

entries for the first index 910 are widely spaced. The first the ranges to the slaves (540 of FIG. 5) according to another 

three changes are assigned to the first index update slave, the embodiment of the present invention. In this embodiment, 

next three to the next slave, and the final four to the last more than the ranges that apply to a particular slave are sent 

slave. The resulting ranges 911, 912 and 913 for the three 4Q to each node. Here an index update distribution table listing 

slaves arc about equal. The ten changed entries for index 2, all ranges for all slaves is generated 1010 and replicated 

920 arc clustercd togetbea^ in the upper half of the range of 1030 on all nodes with slaves. This enables each PDML 

index key values. The first range 921 ^lich includes the first slave or index update slave to send index maintenance 

three change covers a wide interval of index key values. records generated on its node to the slave intended to apply 

The second range, 922 including the next three changes, is 45 the record to the index, according to the coordination 

very short, and the last range 923 is intermediate in extent. provided by the modified CP. As for the individual ranges. 

Finally, the ten changed entries for the third glc^al index 930 the CP may adjust the ranges 1020 associated with each 

are clustered in two sections, one at low values of the index slave based on afSnity of certain daves for certain ranges of 

key and another at the largest valiies of the index key. In this certain global indexes, if any, using distribution procedures 

case, the first range 931, which encompasses the fiirst three 50 known in the art. FIG. 7 shows the index update distribution 

changes is short; the second range 932, whidi covers the table 723 esublished by the CP and its replications 721, 722 

wide gap between changed entries is large; and, the last and 724 on nodes HI, 112 and 114, respectively, 

range 933 is again narrow. FIG. 11 is a schematic diagram of an example index 

During step 540 (in FIG. 5) these ranges arc assigned to update distribution table for coordinating index updates 

the available slaves. One approach is to assign the first range 55 according to another aspect of the present invention. The 

of each index, i.e., 911, 921 and 931 to the first index update index update distribution table 1100 includes a phirality of 

slave (741 in FIG. 7), the second range of each index, i.e., rows or records. Each index update distribution rcoord 

912, 922 and 932 to the second index update slave (742 in represents a different range for a different index update 

FIG. 7), etc. Alternatively the ranges can be distributed slave. Each row has several fields to identify the range and 

differently. For example, to distribute the load associated 60 the slave. In the example index update distribution table 

with the third range in each index which has the extra 1100 in FIG. 11, five fields are shown, one for the Index ID 

change, the ranges can be cycled through the slaves, so that described earlier, 1111, two for the range, 1112 and 1113, one 

the first index update slave 741 gets the first range of the first for a node identification (Node ID), 1115, which uniquely 

index, the second range of the second index and the third identifies a node on the system, and one for a slave identi- 

range of the third index, 911, 922 and 933, respectively. es fication (Slave ID) which uniquely identifies the index 

It is also possible, according to the invention, to modify update slave process among all the processes running on the 

the ranges in anticipation of the slave which will execute the node. Here the contents of the Slave ID and Node ID are 
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represented by their item numbers from FIG, 7, for illustra- 
tion purposes. Not ail these fields are necessary in every 
implementation of the invention. For example, if only one 
global index is involved, the Index ID column 1111 can be 
omitted firom the index update distribution table. As another 
example, a list of ranges can be represented by one colimin 
if the ranges are non-overlapping and contiguous, so cither 
the Key start value column 1112 or the Key end value 
column 1113, but not bo±, can be omitted. Finally, other 
methods known in the art to uniquely identify a process for 
sending messages or other data to the process can be used in 
lieu of the Slave ID and Node ID shown in Table 1100. 

FIG. 12A is a flowchart of a slave process according to 
one embodiment of the present invention. The slave that 
manipulates the table (by inserting, deleting atid/or updating 
rows) also generates the corresponding index maintenance 
records and sends those records to the CP 1210. This sending 
step might be performed by the same slave that does update 
the index, i.e., the index update slave, e.g., slave 741 or this 
might be performed by a separate, modified PDML slave, 
e.g., 731. As described above, in the preferred embodiment, 
a modified PDML slave keeps all index maintenance records 
in buffers for each index during Phase I, and sorts and 
samples from the buffers in Phase II as part of the sending 
step 1210. The modified PDML or index update slave then 
receives the range or ranges for values of one or more index 
keys sent by the CP 1220. The index update slave then 
updates the index using the information in the next available 
index maintenance record according to procedures kiK)wn in 
the art, but, only if the index key value in the record, e.g., the 
contents of column 612 in FIG. 6, is within the range 
received 1230 that is associated with that index update slave. 

According to another embodiment of the present 
invention, an index update distribution table, e.g., 1100 in 
FIG. 11, is used to enable the slaves to receive the ranges, 
e.g., columns 1112 and 1113. All the modified PDML slaves 
or index update slaves may thus receive all ranges for all 
index update slaves identified, for example, in columns 1114 
and/or 1115. 

FIG. 12B is a flowchart detailing the step 1230 of FIG. 
12A for an index update slave according to another embodi- 
ment of the present invention. Id this embodiment, a data 
update distribution table has been received by a particular 
index update slave or modified PDML slave. For example. 
Table 1100 in FIG. 11 may be received by slave 741. The 
particular modified PDML slave or index update slave 741 
uses the index maintenance records available on its node 111 
that were generated there by the local PDML slave 731. The 
index update slaves use the index maintenance records sent 
there by other slaves. In this way, index maintenance records 
can come directly from other slaves and need not be sent by 
the CP. 

The particular slave 741 processes each index mainte- 
nance record, whether generated at node 111 or received 
from slaves oo other nodes. The index maintenance record 
that is currently being processed by the slave 741 is referred 
to herein as the "current index maintenance record", while 
the Key value and Index ID within that record are referred 
to herein as the "current Key value" and "current Index ID", 
respectively. 

A current Key value is obtained from the current record, 
and if available, so is a current Index ID. For example, if the 
index maintenance record 625 in FIG. 6 is the current record, 
then the current Key value is "12** and the current Index ID 
is "1." The index update distribution table 1100 is then 
checked to locate a row containing a range that includes the 
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current Key value and the current Index ID, if any (step 
1250). For example, table 1100 in FIG. 11 is checked to 
locate row 1122 which has a range 12-20 which includes the 
current Key value "12" and an Index ID "1" the same as the 
5 current Index ID. 

At step 1260, the Slave ID in the located row is compared 
against the ID of the particular index update slave. If they are 
the same, the particular slave updates the index 1270; if not, 
the index maintenance record is sent to the proper other 

IQ slave 1280. For example, the Slave ID in the bcated row is 
"742" which is not the same as the particular index update 
slave "741." Therefore, in this example, the current index 
maintenance record 625 is sent to the slave with Slave ID 
"742" (i.e. slave 742 on node 112 in FIG. 7). If the Key value 

j5 in the index maintenance record had been 11, e.g., record 
626 in FIG. 6, then the located record would have been 1121 
in the distribution table 1100, which indicates the particular 
slave 741. In this case the particular slave woidd update the 
global index according to the "insert" operation specified in 

20 the index maintenance record 626. 

In the preferred embodiment, the modified PDML slaves 
receive and use the index update distribution table in Phase 
in. In this case, the index update slave ID in the located row 
will never be the same as the PDML slave. Thus the 

25 determination made in step 1260 can be skipped, and the 
"No" path always followed to step 1280. At step 1280, the 
current index maintenance record is sent to the proper other 
slave. For example, the Slave ID in the located row is "742." 
Therefore, in this example, the current index maintenance 

30 record 625 is sent by the modified PDML slave that gener- 
ated the record, &om its buffer to the index update slave with 
Slave ID "742" (i.e. slave 742 on node 112 in HG. 7). 

Processing then continues with the next index mainte- 
nance record available on the node until there are no records 

35 left (step 1290). In this way, the index maintenance records 
are distributed to the slaves in a coordinated maimer accord- 
ing to instructions from the CP. 

In the preferred embodiment, each index update slave 
stores all the index maintenance records it receives in a 

40 buffer in sorted order, by index key value. Then the index 
update slave can perform updates to the index that benefit 
from clustering. 

DEFERRED INDEX UPDATES 

45 The techniques described above enhance scaleable paral- 
lelism. In addition, they separate the data manipulation steps 
of a PDML operation from the global index update steps so 
that the latter can be performed at a separate time, arbitrarily 
delayed after the data manipulation operations. For example, 

50 the global index update can be done overnight or at other off 
peak periods known in the art. This {Kovides useful flex- 
ibility and an added advantage for the present invention. 

After the database tables have been modified by the 
PDML slaves and before the glc^al indexes arc updated by 

55 the index update slaves, the global index is out of date. 
Indexed access to the database will have to be modified 
accordingly. According to one embodiment, (1) a flag is used 
to indicate whether a global database is out of date, (2) a 
sorted version of the index maintenance records is stored 

60 with the global index, and (3) the index maintenance records 
are checked after using the global index to correct the results 
obtained from the global index. In the preferred 
embodiment, the sorted version of the index maintenance 
records is provided by the sorted buffers of each index 

65 update slave and the index update distribution table. 

FIG. 13 is a flowchart illustrating steps for using a global 
index according to an embodiment of the invention. At step 
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1340, a query is performed using the global index. At step 
1360, a flag is checked to determine if the global index is out 
of date. If the flag indicates the global index is not out of 
date, processing proceeds normally (step 1380). 

If the global index is out of date, however, the query result 5 
is revised by searching through the set of sorted index 
maintenance records to find the unique records with indexed 
key values that have been changed (step 1370). The first 
result is adjusted according to the operations specified in the 
Opcode (614 in FIG. 6) of the index maintenance records lO 
that match the key values of the original query. 

In the foregoing specification, the invention has been 
described with reference to specific embodiments thereof. It 
will, however, be evident dial various modifications and 
changes may be made thereto without departing firom the 
broader spirit and scope of tbe invention as defined by the 
appended claims and equivalents thereto. The specification 
and drawings are, accordingly, to be regarded in an illus- 
trative rather than a restrictive sense. 

What is claimed is: 

1. A method for coordinating an update to a global index 
of an indexed table, the method comprising the steps of: 

receiving index maintenance records firom a plurality of 
data manipulation slaves for the indexed table, wherein 
each index maintenance record includes a value for an 
index key of the global index; 

computing a plurality of ranges for values of the index key 
received in the index maintenance records; 

assigning each range of the plurality of ranges to a 
respective index update slave of a plurality of index 
update slaves; and 

distributing work associated with said update to said 
plurality of index update slaves based on the plurality 
of ranges assigned to said plurality of index update 35 
slaves. 

2. The method of claim 1, wherein: 

the index maintenance records cover a plurality of global 
indexes associated with said indexed table; 

each index maintenance record includes an index identi- ^ 
fication that identifies a particular index of said plural- 
ity of global indexes; and 

the step of computing the plurality of ranges is performed 
for each index of said plurality of global indexes. 

3. The method of daim 1, wherein the step of computing 
the plurality of ranges further comprises: 

sampling a set of the index maintenance records, wherein 
a nimiber of records sampled is a sample size; and 

determining said plurality of ranges based upon index key 
values contained in said set of tbe index maintenance 
records. 

4. Tbe method of claim 3, wherein the step of determining 
said plurahty of ranges includes: 

determining a number of index update slaves, wtierein the 55 
number of index update slaves is an index update 
degree of parallelism; 

sorting values of the index key in the set of the index 
maintenance records sampled; and 

defining a range using an index key value at a specified 60 
position of the values of the index key after sorting, 
wherein the specified position is related to a multiple of 
the sample size divided by the index update degree of 
parallelism. 

5. The method of claim 4, wherein, diiring the sampling, 65 
the sample size equals a total number of records received 
from said plurality of data manipulation slaves. 
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6. The method of claim 4, wherein, during the sampling, 
the sample size is a specified quantity less than a total 
number of records received from said plurality of -data 
manipulation slaves. 

7. The method of claim 4, wherein the specified position 
is also related to an aflSnity of the respective index update 
slave for leaf blocks in the global index, the leaf blocks 
containing the values of the index key in tbe range. 

8. The method of claim 1, the distributing further com- 
prising: 

generating an index distribution table including a range 
and a slave identification for each range of the phirality 
of ranges; and 

replicating the index distribution table for each index 
update slave of the plurality of index update slaves. 

9. llie method of claim 8, wherein the index distribution 
table includes an index identification associated with each 
range, wherein the index identification identifies which 
global index, of a plurality of global indexes built on said 
table, is associated with said range. 

10. The method of claim 8, further comprising, before the 
replicating, revising the index update distribution table 
based on an affinity of a respective index update slave for 
leaf blocks in tbe global index, the leaf blocks containing 
values of the index key in the range. 

11. The method of claim 1, further comprising, before 
receiving index maintenance records, distributing data 
manipulation operations among the plurality of data manipu- 
lation slaves. 

12. The method of claim 11, wherein the data manipula- 
tion operations are distributed based on partitions formed 
11 Ring a key other than tbe index key of said global index. 

13. The method of claim 11, wherein: 

the step of distributing data manipulation operations is 
performed by distributing data manipulation operations 
to a first number of data manipulation slaves; 

tbe step of distributing work associated with said update 
to said plurality of index update slaves is performed by 
distributing work to a second number of index update 
slavey and 

the first numt>er does not equal the second number. 

14. A method for updating a global index of an indexed 
table comprising: 

receiving data that identifies a value range, where the 
value range defines a range of values for an index key 
of the global index of the indexed table; 

reading an index key value horn a current index mainte- 
nance record; and 

updating tbe global index using the current index main- 
tenance record if the index key value falls within the 
value range. 

15. The method of claim 14, wherein said indexed table 
has a plurality of global indexes, the method further includ- 
ing: 

receiving an index identification associated with the value 
range; 

reading a current index identification from the current 
index maintenance record; and 

based on the current index identification, identifying 
which global index of the plurality of global indexes to 
update using said current index maintenance record. 

16. The method of claim 14 wherein: 

a phiraUty of index update slaves is used to update said 
global index; 

each index update slave of said plurality of index update 
slaves has a slave identification; 
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the steps of receiving data that identifies the vahie range, 
reacting the index key value, and updating the global 
index, are performed by a particular index update slave 
of said plurality of index update slaves; 

the particular index update slave has a particular slave 
identification; 

the step of receiving data that identifies the value range 
includes receiving an index update disthbution table 
from a coordinator process, wherein the index update 
distribution table maps value ranges for the index key 
of the global index to slave identifications; 

the method further includes the particular index update 
slave performing the steps of: 

locating the range in the index update distribution table 
that encompasses the index key value read from the 
current index maintenance record; 

determining if a slave identification associated with the 
range located corresponds to the particular slave 
identification; and 

if the slave identification corresponds to the particular 
slave identification, then updating the global index 
using the current index maintenance record. 

17. The method of claim 16, further comprising, if the 
slave identification does not correspond to the particular 
slave identification, then sending the current index mainte- 
nance record to a slave corresponding to the slave identifi- 
cation. 

18. The method of claim 16, wherein: 

the indexed table has a plurality of global indexes; 

the index update distribution table includes an index 

identification associated with each range; 
the method further comprises reading a current index 

identification from the current index maintenance 

record; and 

during said updating the global index, the particular index 
update slave determines which global index of the 
plurality of global indexes to update based on the 
current index identification. 

19. A method of parallelizing a data manipulation opera- 
tion on an indexed table, the method conqirising the steps of: 

based on a first partitioning criteria, dividing said data 
manipulation operation into a first set of work granules; 

distributing said first set of work granules to a first set of 
slaves; 

based on a second partitioning criteria, dividing a task of 
changing one or more indexes built on said indexed 
table into a second set of work granules; and 

distributing said second set of work granules to a second 
set of slaves; 

wherein said task of changing said one or more indexes 
includes updatii^ said one or more indexes to reflect 
changes made to said indexed table by said data 
manipulation operation; and 

wherein said first partitioning criteria is different than said 
second partitioning criteria. 

20. The method of claim 19 wherein the task of changing 
one or more indexes is divided based on index key value 
ranges. 

21. The method of claim 19 wherein the first set of slaves 
is not identical to said second set of slaves. 

22. The method of claim 21 wherein the first set of slaves 
includes a different number of slaves than said second set of 
slaves. 

23. The method of claim 21 wherein the first set of slaves 
includes a slave running on a node that does not execute any 
slaves of said second set of slaves. 
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24. The method of claim 21 wherein the second set of 
slaves includes a slave rtmning on a node that does not 
execute any slaves of said first set of slaves. 

25. The method of claim 20 further comprising the step of 
establishing the index key value ranges based on a sampling 
of index maintenance records generated by said first set of 
slaves. 

26. A computer-readable medium bearing instructions 
ananged to cause one or more processors to perform: 

receiving index maintenance records from a plurality of 
data manipulation slaves for an indexed table of a 
database, wherein each index maintenance record 
includes a value for an index key of a global index for 
the indexed table; 

computing a plurality of ranges for values of the index key 
received in the index maintenance records; 

assigning each range of the plurality of ranges to a 
respective index update slave of a plurality of index 
update slaves; and 

distributing work associated with an update of said global 
index to said plurality of index update slaves based on 
the plurality of ranges assigned to said plurality of 
index update slaves. 

27. A computer-readable medium beariag instructions 
arranged to cause one or more processors to perform: 

receiving data that identifies a value range, where the 
value range defines a range of values for an index key 
of a global index of an indexed table; 

reading an index key value from a current index mainte- 
nance record; and 

updating the global index using the current index main- 
tenance record if the index key value falls within the 
value range. 

28. A computer-readable medium bearing instructions 
arranged to cause one or more processors to perform: 

based on a first partitioning criteria, dividing a data 
manipulation operation into a first set of work granules; 

distributing said first set of woric granules to a first set of 
slaves; 

based on a second partitioning criteria, dividing a task of 
changing one or more indexes built on an indexed table 
into a second set of work granules; and 

distributing said second set of worit granules to a second 
set of slaves; 

wherein said task of changing said one or more indexes 
includes updatir^ said one or more indexes to reflect 
changes made to said indexed table by said data 
manipulation operation; and 

wherein said first partitioning criteria is different than said 
second partitioning criteria. 

29. A computer-readable medium bearitg information for 
use by one or more processors that are participating in 
execution of a global index update, the information com- 
prising: 

an index maintenance record; 

the index maintenance record including 

an index identification that identifies a global index 

requiring change; and 
an index key value of the global index, the index key 
value indicating an index entry requiring change 
within the global index. 

30. A computer-readable medium bearing information for 
use by one or more processors that are participating in 
execution of a global index update, the information cam- 
prising: 
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an index update distribution record; 
wherein the index update distribution record includes 
data that identifies a range of values, wherein the range 
of values is for an index key of a global index of an 
indexed table; and ^ 
a slave identification, wherein said slave identification 
identifies a slave responsible for performing updates 
to said global index for index entries associated with 
key values that fall within said range of values. 

31. The computer-readable mediiim of claim 30, the index 
update distribution record further comprising an index iden- 
tification for indicating one global index of a plurality of 
global indexes. 

32. A system for maintaining a global index of a database 

in response to parallel data manipulation operations, the ^5 
system comprising: 

a plurality of prtx:essing nodes; 

a network connecting the plurality of nodes; 

an iiKiexed table having a global index based on an index 20 
key residing on at least one of the plurality of nodes; 

a coordiiiator process nmning on a first node of the 
plurality of nodes; 

a plurality of data manipulation slaves, each data manipu- 
lation slave on a respective node of the plurality of 
iKxles; and 

a plurality of index update slaves, each index update slave 
on a respective node of the plurality of nodes; 

wherein, the coordinator process is configured to distrib- 
ute data manipulation operations among the plurality of 
data manipulation slaves, to receive index maintenance 
records from the plurality of data manipulation slaves, 
each record including an index key value, to compute 
a plurality of ranges for values of the index key 
receiv&d in the index maintenance records, and to send 
each range of the plurality of ranges to a respective 
index update slave; 
, wherein each data manipulation slave is configured to 
perform a data manipulation operation on a row of the 49 
database and to send an index mainteiumce record to 
the coordinator process; and 

wherein each index update slave is configured to receive 
a respective rai^e of the plurality of ranges, to read an 
index key vahie from a current index maintenance 45 
record, and to update the global index using the current 
index maintenance record if the index key value falls 
within the range of values. 

33. A method of using a global index of an indexed table 
comprising: 50 

executing a query using an index key for the gbbal index 

to provide a query result; 
determining whether an out-of-date flag indicates the 

global index is not current; 
if the out-of-date flag indicates the global index is not 

current, then 

searching through a set of sorted index maintenance 
records, each record containing a value of an index key 
for the global index and an index operation; and ^ 

modifying the query result according to an index main- 
tenance record that includes an index key value that 
matches the index key value used in the query. 

34. A computer-readable medium bearing instructions 
arranged to cause one or more processors to perform: ss 

executing a query using an index key for a gbbal index 
of an indexed table to provide a query result; 



determining whether an out-of-date flag indicates the 

global index is not current; 
if the out-of-<iate flag indicates the global index is not 

current, then 

searching through a set of sorted irxlex maintenance 
records, each record containing a value of an index 
key for the global index and an index operation; and 

modifying the query result according to an index main- 
tenance record that includes an index key value that 
matches the index key values used in the query. 

35. A computer-readable medium bearing instructions 
arranged to cause one or more processors to perform: 

distributing data manipulation operations among a plu- 
rality of data manipulation slaves, receiving index 
maintenance records from the plurality of data manipu- 
lation slaves, each record including an index key value 
for a global index of an indexed table, computing a 
plurality of ranges for values of the index key received 
in the index maintenance records, and sending each 
range of the plurality of ranges to a respective index 
update slave among a plurality of index update slaves, 
by a coordinator process; 

performing a data manipulation operation on a row of the 
database and sending an index maintenance record to 
the coordinator process, by a data manipulation slave; 
and 

receiving a respective range of the plurality of ranges, 
reading an index key value from a current index 
maintenance record, and updating the global index 
using the current index maintenance record if the index 
key value falls within the range of values, by an index 
update slave. 

36. The method of claim 1, the distribution further com- 
prising: 

generating an index distribution table including a range 

and a slave identification for each range of the plurality 

of ranges; and 
replicating the index distribution table for each data 

manipulation slave of the plurality of data manipulation 

slaves. 

37. The method of claim 1, wherein: 

the step of receiving index maintenance records further 
comprises inserting maintenance records into a buffer 
for the global index so the records are in order of 
increasing key value in the buffer; and 

the step of computing the pliurality of ranges further 
comprises determining said plurality of ranges based 
upon index key values contained in said buffer. 

38. The method of claim 37, wherein the step of deter- 
mining said plurality of ranges includes: 

determining a number of index update slaves, wherein the 
number of index update slaves is an index update 
degree of parallelism; 

determining a sample size equal to a number of index 
maintenance records in the buffer; and 

defining a range using an index key value in a index 
maintenance record at a specified position of the buffer, 
wherein the specified position is related to a multiple of 
the sample size divided by the index update degree of 
parallelism. 

39. The method of claim 14, wherein: 

a plurality of index update slaves is used to update said 
global index; 

each index update slave of said plurality of index update 
slaves has a slave identification; 
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the step of receiving data that identifies the value range 
includes receiving an index update distribution table 
from a coordinator process, wherein the index update 
distribution table maps value ranges for the index key 
of the global index to slave identifications; and ■ 5 
the method further comprises 
locating the range in the index update distribution table 
that encompasses the index key value read from the 
current index maintenance record, 
determining an slave identification associated with the 

range located, and 
sending the current index maintenance record to an 
index update slave correspoixling to the slave iden- 
tification associated with the range. 

40. The computCT-readable medium of claim 26 bearing ^5 
instructions arranged to cause one or more processors to 
further perform: 

generating an index distribution table including a range 

and a slave identification for each range of the plurality 

of ranges; and 
replicating the index distribution table for each data 

manipulation slave of the plurality of data manipulation 

slaves. 

41. The computer-readable medium of claim 26, wherein: ^5 
the step of receiviog index maintenance records further 

comprises inserting maintenance records into a buffer 
for the global index so the records are in order of 
increasing key vahie in the buffer; and 
the step of computing the plurality of ranges further 30 
comprises determining said plurality of ranges based 
upon index key values contained in said buffer. 

42. The computer- readable medium of claim 41, wherein 
the step of determining said plurality of ranges includes: 
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determining a number of index update slaves, wherein the 
number of index update slaves is an index update 
degree of parallelism; 

determining a sample size equal to a number of index 
maintenance records in the buffer; and 

defining a range using an index key value in a index 
maintenance record at a specified position of the buffer, 
wherein the ^cified position is related to a multiple of 
the sample size divided by the index update degree of 
parallelism. 

43. The computer-readable medium of claim 27, wherein: 
a plurality of index update slaves is used to update said 
global index; 

each index update slave of said plurality of index update 
slaves has a slave identification; 

the step of receiving data that identifies the value range 
includes receiving an index update distribution table 
from a coordinator process, wherein the index update 
distribution table maps value ranges for the index key 
of the global index to slave identifications; and 

the instructions are arranged to cause one or more pro- 
cessors to further perform 

locating the range in the index update distribution table 
that encompasses the index key value read from the 
current index maintenance record, 

determining an slave identification associated with the 
range located, and 

sending the current index maintenance record to an 
index update slave corresponding to the slave iden- 
tification associated with the range. 
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